{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "In this notebook we'll look at interfacing between the composability and ability to generate complex visualizations that HoloViews provides, the power of [pandas](http://pandas.pydata.org/) library dataframes for manipulating tabular data, and the great-looking statistical plots and analyses provided by the [Seaborn](http://stanford.edu/~mwaskom/software/seaborn) library.\n",
    "\n",
    "This tutorial assumes you're already familiar with some of the core concepts of HoloViews, which are explained in the [other Tutorials](index)."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "import itertools\n",
    "\n",
    "import numpy as np\n",
    "import seaborn as sb\n",
    "import holoviews as hv\n",
    "\n",
    "np.random.seed(9221999)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "We can now select static and animation backends:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "hv.notebook_extension()\n",
    "%output holomap='widgets' fig='svg'"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Visualizing Distributions of Data <a id='Histogram'></a>"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "If ``import seaborn`` succeeds, HoloViews will provide a number of additional ``Element`` types, including ``Distribution``, ``Bivariate``, ``TimeSeries``, ``Regression``, and ``DFrame`` (a ``Seaborn``-visualizable version of the ``DFrame`` ``Element`` class provided when only pandas is available).\n",
    "\n",
    "We'll start by generating a number of ``Distribution`` ``Element``s containing normal distributions with different means and standard deviations and overlaying them. Using the ``%%opts`` magic you can specify specific plot and style options as usual; here we deactivate the default histogram and shade the kernel density estimate:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "%%opts Distribution (hist=False kde_kws=dict(shade=True))\n",
    "d1 = 25 * np.random.randn(500) + 450\n",
    "d2 = 45 * np.random.randn(500) + 540\n",
    "d3 = 55 * np.random.randn(500) + 590\n",
    "hv.Distribution(d1, label='Blue') *\\\n",
    "hv.Distribution(d2, label='Red') *\\\n",
    "hv.Distribution(d3, label='Yellow')"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Thanks to Seaborn you can choose to plot your distribution as histograms, kernel density estimates, and/or rug plots:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "%%opts Distribution (rug=True kde_kws={'color':'indianred','linestyle':'--'})\n",
    "hv.Distribution(np.random.randn(10), vdims=['Activity'])"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "We can also visualize the same data with ``Bivariate`` distributions:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "%%opts Bivariate (shade=True) Bivariate.A (cmap='Blues') Bivariate.B (cmap='Reds') Bivariate.C (cmap='Greens')\n",
    "hv.Bivariate(np.array([d1, d2]).T, group='A') +\\\n",
    "hv.Bivariate(np.array([d1, d3]).T, group='B') +\\\n",
    "hv.Bivariate(np.array([d2, d3]).T, group='C')"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "This plot type also has the option of enabling a joint plot with marginal distribution along each axis, and the ``kind`` option lets you control whether to visualize the distribution as a ``scatter``, ``reg``, ``resid``, ``kde`` or ``hex`` plot:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "%%opts Bivariate [joint=True] (kind='kde' cmap='Blues')\n",
    "hv.Bivariate(np.array([d1, d2]).T, group='A')"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Working with ``TimeSeries`` data"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Next let's take a look at the ``TimeSeries`` View type, which allows you to visualize statistical time-series data. ``TimeSeries`` data can take the form of a number of observations of some dependent variable at multiple timepoints. By controlling the plot and style option the data can be visualized in a number of ways, including confidence intervals, error bars, traces or scatter points."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Let's begin by defining a function to generate sine-wave time courses with varying phase and noise levels."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "def sine_wave(n_x, obs_err_sd=1.5, tp_err_sd=.3, phase=0):\n",
    "    x = np.linspace(0+phase, (n_x - 1) / 2+phase, n_x)\n",
    "    y = np.sin(x) + np.random.normal(0, obs_err_sd) + np.random.normal(0, tp_err_sd, n_x)\n",
    "    return y"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Now we can create HoloMaps of sine and cosine curves with varying levels of observational and independent error."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "sine_stack = hv.HoloMap(kdims=['Observation error','Random error'])\n",
    "cos_stack = hv.HoloMap(kdims=['Observation error', 'Random error'])\n",
    "for oe, te in itertools.product(np.linspace(0.5,2,4), np.linspace(0.5,2,4)):\n",
    "    sines = np.array([sine_wave(31, oe, te) for _ in range(20)])\n",
    "    sine_stack[(oe, te)] = hv.TimeSeries(sines, label='Sine', group='Activity',\n",
    "                                         kdims=['Time', 'Observation'])\n",
    "    cosines = np.array([sine_wave(31, oe, te, phase=np.pi) for _ in range(20)])\n",
    "    cos_stack[(oe, te)]  = hv.TimeSeries(cosines, group='Activity',label='Cosine', \n",
    "                                         kdims=['Time', 'Observation'])"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "First let's visualize the sine stack with a confidence interval:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "%%opts TimeSeries (ci=95 color='indianred')\n",
    "sine_stack"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "And the cosine stack with error bars:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "%%opts TimeSeries (err_style='ci_bars')\n",
    "cos_stack.last"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Since the ``%%opts`` cell magic has applied the style to each object individually, we can now overlay the two with different visualization styles in the same plot:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "cos_stack.last * sine_stack.last"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Working with pandas DataFrames"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "In order to make this a little more interesting, we can use some of the real-world datasets provided with the Seaborn library. The holoviews ``DFrame`` object can be used to wrap the Seaborn-generated pandas dataframes like this:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "iris = hv.DFrame(sb.load_dataset(\"iris\"))\n",
    "tips = hv.DFrame(sb.load_dataset(\"tips\"))\n",
    "titanic = hv.DFrame(sb.load_dataset(\"titanic\"))"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "%output fig='png' dpi=100 size=150"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Iris Data  <a id='Box'></a>"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Let's visualize the relationship between sepal length and width in the Iris flower dataset. Here we can make use of some of the inbuilt Seaborn plot types, starting with a ``pairplot`` that can plot each variable in a dataset against each other variable. We can customize this plot further by passing arguments via the style options, to define what plot types the ``pairplot`` will use and define the dimension to which we will apply the hue option. "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "%%opts DFrame (diag_kind='kde' kind='reg' hue='species')\n",
    "iris.clone(label=\"Iris Data\", plot_type='pairplot')"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "When working with a ``DFrame`` object directly, you can select particular columns of your ``DFrame`` to visualize by supplying ``x`` and ``y`` parameters corresponding to the ``Dimension``s or columns you want visualize. Here we'll visualize the ``sepal_width`` and ``sepal_length`` by species as a box plot and violin plot, respectively. By switching the ``x`` and ``y`` arguments we can draw either a vertical or horizontal plot."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "%%opts DFrame [show_grid=False]\n",
    "iris.clone(x='sepal_width', y='species', plot_type='boxplot') +\\\n",
    "iris.clone(x='species', y='sepal_width', plot_type='violinplot')"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Titanic passenger data  <a id='Correlation'></a>"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "The Titanic passenger data is a truly large dataset, so we can make use of some of the more advanced features of Seaborn and pandas. Above we saw the usage of a ``pairgrid``, which allows you to quickly compare each variable in your dataset. HoloViews also support Seaborn based [FacetGrids](http://stanford.edu/~mwaskom/software/seaborn/tutorial/axis_grids.html#subsetting-data-with-facetgrid). The ``FacetGrid`` specification is simply passed via the style options, where the ``map`` keyword should be supplied as a tuple of the plotting function to use and the ``Dimension``s to place on the x axis and y axis. You may also specify the ``Dimension``s to lay out along the ``row``s and ``col``umns of the plot, and the ``hue`` groups:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "%%opts DFrame (map=('barplot', 'alive', 'age') col='class' row='sex' hue='pclass' aspect=1.0)\n",
    "titanic.clone(plot_type='facetgrid')"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "FacetGrids support most Seaborn and matplotlib plot types:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "%%opts DFrame (map=('regplot', 'age', 'fare') col='class' hue='class')\n",
    "titanic.clone(plot_type='facetgrid')"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "As you can see, the Seaborn plot types and pandas interface provide substantial additional capabilities to HoloViews, while HoloViews allows simple animation, combinations of plots, and visualization across parameter spaces.  Note that the ``DFrame`` ``Element`` is still available even if Seaborn is not installed, but it will use the standard ``HoloViews`` visualizations rather than ``Seaborn`` in that case."
   ]
  }
 ],
 "metadata": {
  "language_info": {
   "name": "python",
   "pygments_lexer": "ipython3"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 0
}
